5 research outputs found

    Exploring Restart Distributions

    Get PDF
    We consider the generic approach of using an experience memory to help exploration by adapting a restart distribution. That is, given the capacity to reset the state with those corresponding to the agent's past observations, we help exploration by promoting faster state-space coverage via restarting the agent from a more diverse set of initial states, as well as allowing it to restart in states associated with significant past experiences. This approach is compatible with both on-policy and off-policy methods. However, a caveat is that altering the distribution of initial states could change the optimal policies when searching within a restricted class of policies. To reduce this unsought learning bias, we evaluate our approach in deep reinforcement learning which benefits from the high representational capacity of deep neural networks. We instantiate three variants of our approach, each inspired by an idea in the context of experience replay. Using these variants, we show that performance gains can be achieved, especially in hard exploration problems.Comment: RLDM 201

    Development of an open technology sensor suite for assisted living: a student-led research project.

    Get PDF
    Many countries have a rapidly ageing population, placing strain on health services and creating a growing market for assistive technology for older people. We have, through a student-led, 12-week project for 10 students from a variety of science and engineering backgrounds, developed an integrated sensor system to enable older people, or those at risk, to live independently in their own homes for longer, while providing reassurance for their family and carers. We provide details on the design procedure and performance of our sensor system and the management and execution of a short-term, student-led research project. Detailed information on the design and use of our devices, including a door sensor, power monitor, fall detector, general in-house sensor unit and easy-to-use location-aware communications device, is given, with our open designs being contrasted with closed proprietary systems. A case study is presented for the use of our devices in a real-world context, along with a comparison with commercially available systems. We discuss how the system could lead to improvements in the quality of life of older users and increase the effectiveness of their associated care network. We reflect on how recent developments in open source technology and rapid prototyping increase the scope and potential for the development of powerful sensor systems and, finally, conclude with a student perspective on this team effort and highlight learning outcomes, arguing that open technologies will revolutionize the way in which technology will be deployed in academic research in the future.This is the final version of the article. It first appeared from Royal Society Publishing via http://dx.doi.org/10.1098/rsfs.2016.001

    Scaling All-Goals Updates in Reinforcement Learning Using Convolutional Neural Networks

    No full text
    Being able to reach any desired location in the environment can be a valuable asset for an agent. Learning a policy to navigate between all pairs of states individually is often not feasible. An all-goals updating algorithm uses each transition to learn Q-values towards all goals simultaneously and off-policy. However the expensive numerous updates in parallel limited the approach to small tabular cases so far. To tackle this problem we propose to use convolutional network architectures to generate Q-values and updates for a large number of goals at once. We demonstrate the accuracy and generalization qualities of the proposed method on randomly generated mazes and Sokoban puzzles. In the case of on-screen goal coordinates the resulting mapping from frames to distance-maps directly informs the agent about which places are reachable and in how many steps. As an example of application we show that replacing the random actions in ε-greedy exploration by several actions towards feasible goals generates better exploratory trajectories on Montezuma's Revenge and Super Mario All-Stars games
    corecore